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Abstract. Exchanging mutable data objects with untrusted code is a delicate matter be- 
cause of the risk of creating a data space that is accessible by an attacker. Consequently, 
secure programming guidelines for Java stress the importance of using defensive copying 
before accepting or handing out references to an internal mutable object. However, im- 
plementation of a copy method (like clone()) is entirely left to the programmer. It may 
not provide a sufficiently deep copy of an object and is subject to overriding by a mali- 
cious sub-class. Currently no language-based mechanism supports secure object cloning. 
This paper proposes a type-based annotation system for defining modular copy policies 
for class-based object-oriented programs. A copy policy specifies the maximally allowed 
sharing between an object and its clone. We present a static enforcement mechanism that 
will guarantee that all classes fulfil their copy policy, even in the presence of overriding of 
copy methods, and establish the semantic correctness of the overall approach in Coq. The 
mechanism has been implemented and experimentally evaluated on clone methods from 
several Java libraries. 



1. Introduction 

Exchanging data objects with untrusted code is a delicate matter because of the risk of 
creating a data space that is accessible by an attacker. Consequently, secure programming 
guidelines for Java such as those proposed by Sun [T7] and CERT [6j stress the importance of 
using defensive copying or cloning before accepting or handing out references to an internal 
mutable object. There are two aspects of the problem: 

(1) If the result of a method is a reference to an internal mutable object, then the receiving 
code may modify the internal state. Therefore, it is recommended to make copies of 
mutable objects that are returned as results, unless the intention is to share state. 

(2) If an argument to a method is a reference to an object coming from hostile code, a 
local copy of the object should be created. Otherwise, the hostile code may be able to 
modify the internal state of the object. 
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A common way for a class to provide facilities for copying objects is to implement a cloneQ 
method that overrides the cloning method provided by java.lang. Object. The following 
code snippet, taken from Sun's Secure Coding Guidelines for Java, demonstrates how a 
date object is cloned before being returned to a caller: 

public class CopyOutput { 

private final java . util . Date date; 

public java . util . Date getDateO { 

return ( java . util . Date) date . clone () ; } 

} 

However, relying on calling a polymorphic clone method to ensure secure copying of objects 
may prove insufficient, for two reasons. First, the implementation of the cloneQ method 
is entirely left to the programmer and there is no way to enforce that an untrusted im- 
plementation provides a sufficiently deep copy of the object. It is free to leave references 
to parts of the original object being copied in the new object. Second, even if the current 
cloneQ method works properly, sub-classes may override the cloneQ method and replace 
it with a method that does not create a sufficiently deep clone. For the above example 
to behave correctly, an additional class invariant is required, ensuring that the date field 
always contains an object that is of class Date and not one of its sub-classes. To quote from 
the CERT guidelines for secure Java programming: "Do not carry out defensive copying 
using the cloneQ method in constructors, when the (non-system) class can be subclassed 
by untrusted code. This will limit the malicious code from returning a crafted object when 
the object's clone () method is invoked." Clearly, we are faced with a situation where basic 
object-oriented software engineering principles (sub-classing and overriding) are at odds 
with security concerns. To reconcile these two aspects in a manner that provides seman- 
tically well-founded guarantees of the resulting code, this paper proposes a formalism for 
defining cloning policies by annotating classes and specific copy methods, and a static en- 
forcement mechanism that will guarantee that all classes of an application adhere to the 
copy policy. Intuitively, policies impose non-sharing constraints between the structure refer- 
enced by a field of an object and the structure returned by the cloning method. Notice, that 
we do not enforce that a copy method will always return a target object that is functionally 
equivalent to its source. Nor does our method prevent a sub-class from making a copy of a 
structure using new fields that are not governed by the declared policy. For a more detailed 



example of these limitations, see Section 2.3 



1.1. Cloning of Objects. For objects in Java to be cloneable, their class must implement 
the empty interface Cloneable. A default clone method is provided by the class Object: 
when invoked on an object of a class, Object. clone will create a new object of that class 
and copy the content of each field of the original object into the new object. The object 
and its clone share all sub-structures of the object; such a copy is called shallow. 

It is common for cloneable classes to override the default clone method and provide 
their own implementation. For a generic List class, this could be done as follows: 

public class List<V> implements Cloneable 
{ 

public V value; 

public List<V> next; 
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public List (V val, List<V> next) { 
this. value = val; 
this. next = next; } 

public List<V> clone () { 

return new List (value, (next==null) ?null : next . clone ()) ; } 

} 

Notice that this cloning method performs a shallow copy of the list, duplicating the spine but 
sharing all the elements between the list and its clone. Because this amount of sharing may 
not be desirable (for the reasons mentioned above), the programmer is free to implement 
other versions of clone (). For example, another way of cloning a list is by copying both 
the list spine and its element^] creating what is known as a deep copy. 

public List<V> deepCloneO { 

return new List ( (V) value . clone () , 

(next==null ? null : next . deepClone ( ) ) ) ; } 

A general programming pattern for methods that clone objects works by first creating a 
shallow copy of the object by calling the super. cloneQ method, and then modifying certain 
fields to reference new copies of the original content. This is illustrated in the following 
snippet, taken from the class LinkedList in Fig. [8j 

public Object clone () { ... 
clone = super . clone () ; ... 

clone. header = new Entry<E> (null, null, null); ... 
return clone; } 

There are two observations to be made about the analysis of such methods. First, an 
analysis that tracks the depth of the clone being returned will have to be flow-sensitive, 
as the method starts out with a shallow copy that is gradually being made deeper. This 
makes the analysis more costly. Second, there is no need to track precisely modifications 
made to parts of the memory that are not local to the clone method, as clone methods are 
primarily concerned with manipulating memory that they allocate themselves. This will 
have a strong impact on the design choices of our analysis. 

1.2. Copy Policies. The first contribution of the paper is a proposal for a set of semanti- 
cally well-defined program annotations, whose purpose is to enable the expression of policies 
for secure copying of objects. Introducing a copy policy language enables class developers 
to state explicitly the intended behaviour of copy methods. In the basic form of the copy 
policy formalism, fields of classes are annotated with ©Shallow and ©Deep. Intuitively, the 
annotation ©Shallow indicates that the field is referencing an object, parts of which may 
be referenced from elsewhere. The annotation ©Deep(X) on a field f means that a) upon 
return from cloneQ, the object referenced by this field f is not referenced from elsewhere, 
and b) the field f is copied according to the copy policy identified by X. Here, X is either 
the name of a specific policy or if omitted, it designates the default policy of the class of 
the field. For example, the following annotations: 



To be type-checked by the Java compiler it is necessary to add a cast before calling clone() on value. 
A cast to a sub interface of Cloneable that declares a cloneQ method is necessary. 
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class List { @Shallow V value; @Deep List next; . . . } 

specifies a default policy for the class List where the next field points to a list object that 
also respects the default copy policy for lists. Any method in the List class, labelled with 
the ©Copy annotation, is meant to respect this default policy. 

In addition it is possible to define other copy policies and annotate specific copy methods 
(identified by the annotation @Copy(...)) with the name of these policies. For example, the 
annotation^ 

DL: { @Deep V value; @Deep(DL) List next;}; 
@Copy(DL) List<V> deepCloneO { 

return new List ( (V) value . clone () , 

(next==null ? null : next . deepClone ( ) ) ) ; } 

can be used to specify a list-copying method that also ensures that the value fields of a list 
of objects are copied according to the copy policy of their class (which is a stronger policy 
than that imposed by the annotations of the class List). We give a formal definition of the 
policy annotation language in Section [2] 

The annotations are meant to ensure a certain degree of non-sharing between the origi- 
nal object being copied and its clone. We want to state explicitly that the parts of the clone 
that can be accessed via fields marked @Deep are unreachable from any part of the heap 
that was accessible before the call to cloneQ. To make this intention precise, we provide a 
formal semantics of a simple programming language extended with policy annotations and 



define what it means for a program to respect a policy (Section 2.2). 



1.3. Enforcement. The second major contribution of this work is to make the developer's 
intent, expressed by copy policies, statically enforceable using a type system. We formalize 
this enforcement mechanism by giving an interpretation of the policy language in which 
annotations are translated into graph-shaped type structures. For example, the default 
annotations of the List class defined above will be translated into the graph that is depicted 
to the right in Fig. [T] (res is the name given to the result of the copy method). The left 
part shows the concrete heap structure. 

Unlike general purpose shape analysis, we take into account the programming method- 
ologies and practice for copy methods, and design a type system specifically tailored to 
the enforcement of copy policies. This means that the underlying analysis must be able to 
track precisely all modifications to objects that the copy method allocates itself (directly or 
indirectly) in a flow-sensitive manner. Conversely, as copy methods should not modify non- 
local objects, the analysis will be designed to be more approximate when tracking objects 
external to the method under analysis, and the type system will accordingly refuse methods 
that attempt such non-local modifications. As a further design choice, the annotations are 
required to be verifiable modularly on a class-by-class basis without having to perform an 
analysis of the entire code base, and at a reasonable cost. 

As depicted in Fig. [TJ concrete memory cells are either abstracted as a) T ou t when they 
are not allocated in the copy method itself (or its callee); b) T when they are just marked as 
maybe-shared; and c) circle nodes of a deterministic graph when they are locally allocated 
and not shared. A single circle furthermore expresses a singleton concretization. In this 



Our implementation uses a sightly different policy declaration syntax because of the limitations imposed 
by the Java annotation language. 
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Figure 1. A linked structure (left part) and its abstraction (right part). 

example, the abstract heap representation matches the graph interpretation of annotations, 
which means that the instruction set that produced this heap state satisfies the specified 
copy policy. 

Technically, the intra-procedural component of our analysis corresponds to heap shape 
analysis with the particular type of graphs that we have defined. Operations involving non- 
local parts of the heap are rapidly discarded. Inter-procedural analysis uses the signatures 
of copy methods provided by the programmer. Inheritance is dealt with by stipulating that 
inherited fields retain their "shallow/deep" annotations. Redefinition of a method must 
respect the same copy policy and other copy methods can be added to a sub-class. The 
detailed definition of the analysis, presented as a set of type inference rules, is given in 
Section [3l 

This article is an extended version of a paper presented at ESOP'll [13]. We have 
taken advantage of the extra space to provide improved and more detailed explanations, 
in particular of the inference mechanism and of what is exactly is being enforced by our 
copy policies. We have also added details of the proof of correctness of the enforcement 
mechanism. The formalism of copy policies and the correctness theorem for the core lan- 
guage defined in Section [2] have been implemented and verified mechanically in Coq [lj. 
The added details about the proofs should especially facilitate the understanding of this 
Coq development 

2. Language and Copy Policies 

The formalism is developed for a small, imperative language extended with basic, class- 
based object-oriented features for object allocation, field access and assignment, and method 
invocation. A program is a collection of classes, organized into a tree-structured class 
hierarchy via the extends relation. A class consists of a series of copy method declarations 
with each its own policy X, its name m, its formal parameter x and commands c to execute. 
A sub-class inherits the copy methods of its super-class and can re-define a copy method 
defined in one of its super-classes. We only consider copy methods. Private methods (or 
static methods of the current class) are inlined by the type checker. Other method calls 
(to virtual methods) are modeled by a special instruction x:=?(y) that assigns an arbitrary 
value to x and possibly modifies all heap cells reachable from y (except itself). The other 
commands are standard. The copy method call x:=m cn: x(y) is a virtual call. The method 
to be called is the copy method of name m defined or inherited by the (dynamic) class of the 
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x, y G Var f G Field m G Meth cn G Classy X G Policy id 



p 


G 


Prog 


cl 


G 


Class 


pd 


G 


PolicyDecl 


T 


G 


Policy 


md 


G 


MethDecl 


c 


G 


Comm 



= cl 



class cn [extends cn] {pd md} 

X:{r } 

(X,f) 

Copy(X) m(x):=c 

x:=y | x:=y.f \ x.f:=y | x:=null 

| x :— new cn \ x:=m cn: x{y) \ x:=?(y) \ return x 

\ c;c\ if (*) then c else c fi \ while (*) do c done 



Notations: We write -< for the reflexive transitive closure of the subclass relation induced by a 
(well-formed) program that is fixed in the rest of the paper. We write x a sequence of syntactic 
elements of form x. 



Figure 2. Language Syntax. 



object stored in variable y. The subscript annotation cn:X is used as a static constraint. It 
is supposed that the type of y is guaranteed to be a sub-class of class cn and that cn defines 
a method m with a copy policy X. This is ensured by standard bytecode verification and 
method resolution. 

We suppose given a set of policy identifiers Policy id , ranged over by X. A copy policy 
declaration has the form X : {t} where X is the identifier of the policy signature and 
r is a policy. The policy r consists of a set of field annotations (X, f) ; ... where / is 
a deep field that should reference an object which can only be accessed via the returned 
pointer of the copy method and which respects the copy policy identified by X. The use of 
policy identifiers makes it possible to write recursive definitions of copy policies, necessary 
for describing copy properties of recursive structures. Any other field is implicitly shallow, 
meaning that no copy properties are guaranteed for the object referenced by the field. 
No further copy properties are given for the sub-structure starting at shallow fields. For 



instance, the default copy policy declaration of the class List presented in Sec. 1.2 writes: 
List. default : {(List.def ault, next)}. 

We assume that for a given program, all copy policies have been grouped together in a 
finite map n p : Policy id — > Policy. In the rest of the paper, we assume this map is complete, 
i.e. each policy name X that appears in an annotation is bound to a unique policy in the 
program p. 

The semantic model of the language defined here is store-based: 



I 


G 


Loc 




V 


G 


Val 


= Loc U {o} 


p 


G 


Env 


= Var — > Val 





G 


Object 


= Field ->■ Val 


h 


G 


Heap 


= Loc — ^fii, (Classid x Object) 


(p,h,A) 


G 


State 


= Env x Heap x V(Loc) 



A program state consists of an environment p of local variables, a store h of locations 
mappinajto objects in a heap and a set A of locally allocated locations, i.e., the locations 



We note -^H n for partial functions on finite domains. 
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(x:=y, (p, Zi, A}) (p[a; n> p(j/)], /i, A) (x:=null, (p, h,A)) ~> (p[x- i-> ©], /i, A) 
p(y) £ dom(/i) p(x) 6 dom(h) 

(*: = !,./, <p, ft, A)) - <p[x -> h(p(y), /)], h, A} (x.f: = y,(p,h,A)) - (p, ft[(p(x), /) ^ p(j/)],A> 

Z ^ dom(ft) 

(x := new cn, (p, h, A)) ~» {p[x i— » Z], Zi[Z i— » (era, o©)], A U {/}} 

(return x, (p, h, A)) ~» (p[ret i-> p(a;)], h, A) 

h(p(y)) = (cn y ,_) lookup(cn y ,m) = (Copy(X') m(a):=c) cn v H cn 

(c, (p c [a^p(y)U,»))^(p',hU') 

(y), <p, h, A)) ~* (p[x ^ p'(ret)],h', A U A') 

dom(h) C dom(Zi') V/ 6 dom(h) \ Reach h (p(y)), h(l) = h'(l) 
VZ £ dom(fe) \ Reach h (p(j/)), VZ' e dom(/i'), Z e Reach h / (Z') => Z' e dom(fc) \ Reach h (p(j/)) 
j; e {o} + Reach h (p(y)) U (dom(Ji') \ dom(ft)) 

(x:=?(y), (pAA}) ~* <p[x- ^ v],h',A\Re aC h+(p(y))) 
(ci, <p,Zi,A)) ~* (pi,h\,Ai) (c 2 , (pi, hi, Ai}) ~> {p 2 ,h 2 , A2) 
(ci; c 2 , (p, Zi, A}) ~» (p2, /i2, A 2 } 

(ci,(p, fe,A)) ~> (pi,fei,Ai) (c 2 ,(p,h,A)) ~* (p2,h 2 ,A 2 ) 

(if (*) then ci eZse c 2 fi,(p,h,A)) •»•> (pi,Zii,Ai) (j/ (*) iZien ci eZse C2 fi,(p,h,A)) ~* (p2;h2,A2) 

(c; while (*) do c done, (p, h, A)) ~> (p' , h' , A') 
(while (*) do c done, (p, Zi, A)) ~> (p, h, A) (while (*) do c done, (p, Zi, A)) ~-> (p', Zi', A'} 

Notations: We write h(l, f) for the value o(/) such that I £ dom(/i) and h(l) = o. We write h\(l, f) t-> v] 
for the heap b! that is equal to h except that the / field of the object at location I now has value v. Similarly, 
p[x 1 y v] is the environment p modified so that x now maps to v. The object o© is the object satisfying 
°o(/) = * f° r a ^ field /, and p© is the environment such that p<>(x) — o for all variables x. We consider 
methods with only one parameter and name it p. lookup designates the dynamic lookup procedure that, 
given a class name cn and a method name m, find the first implementation of m in the class hierarchy 
starting from the class of name cn and scanning the hierarchy bottom-up. It returns the corresponding 
method declaration, ret is a specific local variable name that is used to store the result of each method. 
Reach^(Z) (resp. Reach^(Z)) denotes the set of values that are reachable from any sequence (resp. any 
non-empty sequence) of fields in h. 

Figure 3. Semantic Rules. 

that have been allocated by the current method invocation or by one of its callees. This 
last component does not influence the semantic transitions: it is used to express the type 
system interpretation defined in Sec.[3j but is not used in the final soundness theorem. Each 
object is modeled in turn as a pair composed with its dynamic class and a finite function 
from field names to values (references or the specific o reference for null values). We do not 
deal with base values such as integers because their immutable values are irrelevant here. 
The operational semantics of the language is defined (Fig. [3]) by the evaluation relation 
between configurations Coram x State and resulting states State. The set of locally 
allocated locations is updated by both the x := new cn and the x:=m cn: x(y) statements. 
The execution of an unknown method call x:=?(y) results in a new heap h! that keeps all 
the previous objects that were not reachable from p{y). It assigns the variable x a reference 
that was either reachable from p{y) in h or that has been allocated during this call and 
hence not present in h. 



8 



T. JENSEN, F. KIRCHNER, AND D. PICHARDIE 



2.1. Policies and Inheritance. We impose restrictions on the way that inheritance can 
interact with copy policies. A method being re-defined in a sub-class can impose further 
constraints on how fields of the objects returned as result should be copied. A field already 
annotated deep with policy X must have the same annotation in the policy governing the 
re-defined method but a field annotated as shallow can be annotated deep for a re-defined 
method. 

Definition 2.1 (Overriding Copy Policies). A program p is well- formed with respect to 
overriding copy policies if and only if for any method declaration Copy(X') m(x):= . . . that 
overrides (i.e. is declared with this signature in a subclass of a class cl) another method 
declaration Copy(X) m(x):= . . . declared in cl, we have 

n p (A) c u p (x'). 

Intuitively, this definition imposes that the overriding copy policy is stronger than the policy 



that it overrides. Lemma 2.4 below states this formally. 



Example 2.2. The java.lang. Object class provides a cloneQ method of policy {} (be- 
cause its native cloneQ method is shallow on all fields). A class A declaring two fields f and 
g can hence override the clone() method and give it a policy {(X,g)}. If a class B extends 
A and overrides clone(), it must assign it a policy of the form {(X, g); . . . } and could 
declare the field f as deep. In our implementation, we let the programmer leave the policy 
part that concerns fields declared in superclasses implicit, as it is systematically inherited. 



2.2. Semantics of Copy Policies. The informal semantics of the copy policy annotation 
of a method is: 

A copy method satisfies a copy policy X if and only if no memory cell that 
is reachable from the result of this method following only fields with deep 
annotations in X, is reachable from another local variable of the caller. 

We formalize this by giving, in Fig. [4j a semantics to copy policies based on access paths. 
An access path consists of a variable x followed by a sequence of field names separated by 
a dot. An access path ir can be evaluated to a value v in a context (p, h) with a judgement 
(p, h) h 7r JJ. v. Each path tt has a root variable \,tt G Var. A judgement h ir : r holds when 
a path 7r follows only deep fields in the policy r. The rule defining the semantics of copy 
policies can be paraphrased as follows: For any path tt starting in x and leading to location 
I only following deep fields in policy t, there cannot be another path leading to the same 
location / which does not start in x. 

Definition 2.3 (Secure Copy Method). A method m is said secure wrt. a copy signature 
Copy(A){r} if and only if for all heaps h\,h 2 G Heap, local environments pi,p 2 G Env, 
locally allocated locations A\,A% G V(Loc), and variables x,y G Var, 

(x:=m cn]X (y), (pi,h 1 ,A 1 )) ~* {p 2 ,h 2 ,A 2 ) implies p 2 ,h 2 ,x |= r 

Note that because of virtual dispatch, the method executed by such a call may not be the 
method found in cn but an overridden version of it. The security policy requires that all 
overriding implementations still satisfy the policy r. 

Lemma 2.4 (Monotonicity of Copy Policies wrt. Overriding). 

T\ C t 2 implies V7t, p, x, p, h, x |= t 2 => p, h, x |= n 
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Access path syntax 

7T 6 P ::= X | 7T./ 

Access path evaluation 

<p, /i)hi(l p(x) — 

(p,/l> h TT./ JJ. £>(/) 

Access path root 

\.x = x X^-f — 

Access path satisfying a policy 

We suppose given U p : Policy id — > Policy the set of copy policies of the considered program p. 

(Xi /i) g t, (x 2 h) e n p (Xi), ■■■,(!„ /„) g n p (x„-i) 

I- X : r h x./i /„ : t 

Policy semantics 

\/n,n' ePyi,l' e Loc, x = in, in' ^ x, \ 

(p, h) h vr ^ i , (p, h tt' JJ. V, \ implies i ^ V 
h TV : t J 
p, h, x \= T 

Figure 4. Copy Policy Semantics 



Proof. [See Coq proof Overriding . copy_policy_monotony pQ] 

Under these hypotheses, for all access paths tt, h ir : t\ implies h ir : T2- Thus the result 
holds by definition of |=. D 

Thanks to this lemma, it is sufficient to prove that each method is secure wrt. its own 
copy signature to ensure that all potential overridings will be also secure wrt. that copy 
signature. 

2.3. Limitations of Copy Policies. The enforcement of our copy policies will ensure that 
certain sharing constraints are satisfied between fields of an object and its clone. However, 
in the current formalism we restrict the policy to talk about fields that are actually present 
in a class. The policy does not ensure properties about fields that are added in sub-classes. 
This means that an attacker could copy e.g., a list by using a new field to build the list, as 
in the following example. 

public class EvilList<V> extends List<V> 

{ 

SShallow public List<V> evilNext; 

public EvilList (V val, List<V> next) { 
super (val, null) ; 
this . evilNext = next; } 

public List<V> clone () { 

return new EvilList (value, evilNext) ; } 

// redefinition of all other methods to use the evilNext field 
// instead of next 

} 

The enforcement mechanism described in this article will determine that the clone() method 
of class EvilList respects the copy policy declared for the List class in Section [l . 2| because 
this policy only speaks about the next field which is set to null. It will fail to discover 
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that the class EvilList creates a shallow copy of lists through the evilNext field. In order 
to prevent this attack, the policy language must be extended, e.g., by adding a facility 
for specifying that all fields except certain, specifically named fields must be copied deeply. 
The enforcement of such policies will likely be able to reuse the analysis technique described 
below. 

3. Type and Effect System 

The annotations defined in the previous section are convenient for expressing a copy policy 
but are not sufficiently expressive for reasoning about the data structures being copied. The 
static enforcement of a copy policy hence relies on a translation of policies into a graph-based 
structure (that we shall call types) describing parts of the environment of local variables 
and the heap manipulated by a program. In particular, the types can express useful alias 
information between variables and heap cells. In this section, we define the set of types, 
an approximation (sub-typing) relation C on types, and an inference system for assigning 
types to each statement and to the final result of a method. 
The set of types is defined using the following symbols: 

neN t et = N + {±,T 0Ut ,T} 

r € Var — >• t A € A = AT ^ fin Field -> t 

OeP(JV) T € T = ( Var -»• t) x A x V(N) 

We assume given a set N of nodes. A value can be given a base type t in N + {_L, T out , T}. 
A node n means the value has been locally allocated and is not shared. The symbol _L 
means that the value is equal to the null reference o. The symbol T ou t means that the 
value contains a location that cannot reach a locally allocated object. The symbol T is 
the specific "no-information" base type. As is standard in analysis of memory structures, 
we distinguish between nodes that represent exactly one memory cell and nodes that may 
represent several cells. If a node representing only one cell has an edge to another node, 
then this edge can be forgotten and replaced when we assign a new value to the node — this 
is called a strong update. If the node represents several cells, then the assignment may not 
concern all these cells and edges cannot be forgotten. We can only add extra out-going 
edges to the node — this is termed a weak update. In the graphical representations of types, 
we use singly-circled nodes to designate " weak" nodes and doubly-circled nodes to represent 
"strong" nodes. 

A type is a triplet T = (V, A, G) € T where 

T: is a typing environment that maps (local) variables to base types. 

A: is a graph whose nodes are elements of N. The edges of the graphs are labeled 

with field names. The successors of a node is a base type. Edges over-approximate 

the concrete points-to relation. 
0: is a set of nodes that represents necessarily only one concrete cell each. Nodes in 

are eligible to strong update while others (weak nodes) can only be weakly updated. 

Example 3.1. The default List policy of Sec. |1.2| translates into the type 
r = [res i— > m,this i— > T ut\ 

A = [(ni, next) i— >• n%, (ri2, next) i— >• ri2, (ni, value) i— >• T, (712, value) 1— >• T] 
= {m}. 
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As mentioned in Sec |1.3[ this type enjoys a graphic representation corresponding to the 
right-hand side of Fig. [TJ 

In order to link types to the heap structures they represent, we will need to state 
reachability predicates in the abstract domain. Therefore, the path evaluation relation is 
extended to types using the following inference rules: 

(r,A)h^n (r,A) h^T (r, A) h 7T j). T out 

(T,A)hx^T(x) (r,A)h7r./^A[n,/] (I\ A) r- tt./ ^ T (r, A) h vr./ ij, T out 

Notice both T out and T are considered as sink nodes for path evaluation purposes^} 

3.1. From Annotation to Type. The set of all copy policies Tl p C PolicyDecl can be 
translated into a graph A p as described hereafter. We assume a naming process that 
associates to each policy name X £ Policy [d of a program a unique node n' x G AT". 

A p= U [( n 'xJi) n' Xl ,--- ,{n' x ,f k ) n' Xk ] 

X:{{X 1 J 1 );...;(X k J k )}eU p 

Given this graph, a policy r = {(X±, fx); . . . ; (Xk, fk)} that is declared in a class cl is 
translated into a triplet: 

$(r) = (n T ,A p U [(n T ,fi) ^n' Xl ,--- ,(n T ,f k ) (->• n' x J ,{n T }) 

Note that we unfold the possibly cyclic graph A p with an extra node ra T in order to be 
able to catch an alias information between this node and the result of a method, and hence 
declare n T as strong. Take for instance the type in Fig. [TJ were it not for this unfolding 
step, the type would have consisted only in a weak node and a T node, with the variable 
res mapping directly to the former. Note also that it is not necessary to keep (and even to 
build) the full graph A p in 3>(r) but only the part that is reachable from n T . 

3.2. Type Interpretation. The semantic interpretation of types is given in Fig. [5j in the 
form of a relation 

(p,h,A) ~ (r,A,6) 

that states when a local allocation history A, a heap h and an environment p are coherent 
with a type (T, A, 0). The interpretation judgement amounts to checking that (i) for 
every path tt that leads to a value v in the concrete memory and to a base type t in the 
graph, t is a correct description of v, as formalized by the auxiliary type interpretation 
(p, h, A) , (r, A) lh v ~ t; (ii) every strong node in represents a uniquely reachable value 
in the concrete memory. The auxiliary judgement {p,h,A) , (r, A) lh v ~ t is defined by 
case on t. The null value is represented by any type. The symbol T represents any value 
and T ou t those values that do not allow to reach a locally allocated location. A node n 
represents a locally allocated memory location I such that every concrete path it that leads 
to I in (p, h) leads to node n in (T, A). 

We now establish a semantic link between policy semantics and type interpretation. 
We show that if the final state of a copy method can be given a type of the form $(r) then 
this is a secure method wrt. the policy r. 

4 The sink nodes status of T (resp. Tout) can be understood as a way to state the following invariant 
enforced by our type system: when a cell points to an unspecified (resp. foreign) part of the heap, all 
successors of this cell are also unspecified (resp. foreign). 
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Auxiliary type interpretation 

Reach h (/) n A 



(p, h, A) , (r, A) lh o ~ t (p, h, A) , (r, A) lh v ~ T (p, h, A) , (T, A) lh I ~ T D „t 

I e A n 6 dom(A) Vtt, (p, h) h tt JJ. I => (r, A) h tt JJ. n 
<p, h, A) , (r, A) lhi~n 

Main type interpretation 

Vir,Vt,Vv, Vn £ 6, Vtt.Vtt'.VZ.VZ', 

(p, /l) h 7T ^ « J 1 '"li; (p, h) h 7T 4J- i A <p, ft) I- 7T 4- » J 

(p,^a> ~ (r,A,e) 
Figure 5. Type Interpretation 

Value sub-typing judgment 

tet t 6 t\JV n £ N 

± < a t t < CT T T ollt <tr Tout n <a u(n) 

Main sub-typing judgment 

a e dom(Ai) -» dom(A 2 ) + {T} (STi) 

Vfi e t,V7r e p, (ri, Ai) h tt jj. ti => 3t 2 e t,ti < a t 2 a (r 2 , A 2 ) h tt jj. t a (st 2 ) 

Vn 2 6 02, 3ni 6 0i, o-- 1 (n 2 ) = {ni} (ST 3 ) 

(ri,Ai,6i) c (r 2 ,A 2 ,0 2 ) 
Figure 6. Sub- typing 



Theorem 3.2. Let $(t) = (n r , A T , T ) ; p G Env,A G V(Loc), and x G Var. Assume 
that, for all y G Var suc/i i/iai y is distinct from x, A is not reachable from p(y) in a 
given heap h, i.e. Reach> l {p{y)) n ^4 = 0. // there exists a state of the form (p',h,A), a 
return variable res and a local variable type V such that /?'(res) = p(x), r'(res) = n T and 
(p\ h, A) ~ (r', A T , T ), then p, h, x \= r holds. 

Proof. [See Coq proof InterpAnnot . sound_annotation_to_type PQ] 

We consider two paths it' and x.ir such that ^ x, (p, h) h ir' Jj I, h x.tt : r, 
(p, h) h x.7r JJ- Z and look for a contradiction. Since h x.-zr : r and r'(res) = n T , there exists 
a node n G A r such that (r',A T ) h res.7r Jj. n. Furthermore (p',h) h res.7r Jj / so we can 
deduce that I G A Thus we obtain a contradiction with (p, h) \- u' Jj £ because any path 
that starts from a variable other than x cannot reach the elements in A. □ 



3.3. Sub-typing. To manage control flow merge points we rely on a sub-typing relation 
C described in Fig. [6j A sub-type relation (Ti, Ai, ©i) C (r2, A2, ©2) holds if and only if 



( STi ) there exists a fusion function a from dom(Ai) to dom(A2) + {T}. a is a mapping that 
merges nodes and edges in Ai such that (jSTg} every element t\ of Ai accessible from a path 
7r is mapped to an element ti of A2 accessible from the same path, such that t\ < a t^. In 
particular, this means that all successors of t\ are mapped to successors of ti- Incidentally, 
because T acts as a sink on paths, if t\ is mapped to T, then all its successors are mapped 
to T too. Finally, when a strong node in Ai maps to a strong node in A2, this image node 
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cannot be the image of any other node in Ai — in other terms, a is injective on strong nodes 



(ST; 



Intuitively, it is possible to go up in the type partial order either by merging, or by 
forgetting nodes in the initial graph. The following example shows three ordered types and 
their corresponding fusion functions. On the left, we forget the node pointed to by y and 



hence forget all of its successors (see ( ST2 ) ) . On the right we fusion two strong nodes to 
obtain a weak node. 



■o 



The logical soundness of this sub-typing relation is formally proved with two intermediate 
lemmas. The first one states that paths are preserved between subtypes, and that they 
evaluate into basetypes that are related by the subtyping function. 

Lemma 3.3 (Pathing in subtypes). Assume (p,h,A) ~ (Ti), and let a be the fusion map 
defined by the assertion T\ C T'2. For any it, r such that (p, h) \- tt Jj- r, for any t 2 such that 
(r 2 ,A 2 )h^i 2 ; 

3h< a t 2 , (Tj.Ai) hTT-Ml. 

Proof. [See Coq proof Misc . Access_Path_Eval_subtyp pQ] The proof follows directly 
from the definition of type interpretation and subtyping. □ 

The second lemma gives a local view on logical soundness of subtyping. 

Lemma 3.4 (Local logical soundness of subtyping). Assume (ri,Ai,©i) C (^^2,62), 
and let v be a value and t%,t 2 some types. 

(p, h, A) , (]?!, Ai) lh v ~ t\ and t\ C ff t 2 implies (p, h, A) , (r 2 , A 2 ) lh v ~ t 2 . 

Proof. [See Coq proof Misc . Interp_monotone [lj] We make a case for each rules of 
ti Eo- t 2 . The only non-trivial case is for v = I € Loc, t\ = n € N and t 2 = <x(n) £ dom(A 2 ). 
In this case we have to prove W, (p, h) \- w I (r2, A 2 ) I — vr 4J- a(n). Given such a path 
7T, the hypothesis (p,h,A) , (Ti, Ai) lh / ~ n gives us (Ti, Ai) h ir JJ. n. Then subtyping 
hypothesis 1ST2 gives us a base type t' 2 such that n t' 2 and (r2,A 2 ) h w t' 2 . But 
necessarily f 2 = t' 2 so we are done. D 

The logical soundness of this sub-typing relation is then formally proved with the fol- 
lowing theorem. 

Theorem 3.5. For any type Ti, T 2 6 T and (p, h, A) £ State, T\ C T'2 and {p, h, A) ~ (Ti) 
imply (p,h,A) ~ (T 2 ). 

Proof. [See Coq proof InterpMonotony . Interpretation_monotone [I]] 

We suppose T\ is of the form (ri,Ai,0i) and T2 of the form (r2,A2,©2)- From 

the definition of the main type interpretation (Fig [5]) , we reduce the proof to proving the 

following two subgoals. 

First, given a path 7r, a base type t 2 and a value v such that (r2,A2) h 7r J| t 2 and 

(p, h) h 7r JJ. v , we must prove that (p, h, A) , (T2, A2) \\~ v ~ t 2 holds. Since (ri, Ai, Ox) Q 
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(r2, A2,02), there exists, by Lemma 3.3, a base type t\ 
h Ect Since (p,h,A) ~ (Ti) holds we can argue that (p, h, A) ,(Ti,Ai 
too and conclude with Lemma 13.41 



such that (ri,Ai) h it JJ. ti and 
lh f ~ ti holds 



Second, given a strong node ri2 G ©2, two paths 7r and it' and two locations Z and I' such 
that (T2, A2) h vr JJ. ri2, (r2, A2) h 7r' JJ. 722, (p, /a) h 7r JJ- Z and (p, h) h 7r JJ- Z', we must prove 
that Z = Z'. As previously, there exists by Lemma 3.3, t\ and such that (ri, Ai) h7rJJ.i1, 
h Ect n 2> (ri,Ai) h 7r' JJ. t\ and C CT n^. But then, by (ST3), there exists some strong 
node n\ such that t\ = = n\ and we can obtain the desired equality from the hypothesis 

(p,M)~(ri,Ai,ei). □ 



3.4. Type and Effect System. The type system verifies, statically and class by class, 
that a program respects the copy policy annotations relative to a declared copy policy. The 
core of the type system concerns the typability of commands, which is defined through the 
following judgment: 

r,A,eh c: r',A',e'. 

The judgment is valid if the execution of command c in a state satisfying type (r, A, 0) 
will result in a state satisfying (V , A', 0') or will diverge. 

Typing rules are given in Fig. [7j We explain a selection of rules below. The rules for 
if (*) then else fi, while (*) do done, sequential composition and most of the assignment 
rules are standard for flow-sensitive type systems. The rule for x '■= new "allocates" a 
fresh node n with no edges in the graph A and let T(x) references this node. 

There are two rules concerning the instruction x.f:=y for assigning values to fields. 
Assume that the variable x is represented by node n (i.e., T(x) = n). In the first case 
(strong update), the node is strong and we update destructively (or add) the edge in the 
graph A from node n labeled / to point to the value of T(y). The previous edge (if any) is 
lost because n £ ensures that all concrete cells represented by n are affected by this field 
update. In the second case (weak update), the node is weak. In order to be conservative, we 
must merge the previous shape with its updated version since the content of x.f is updated 
but an other cell mays exist and be represented by n without being affected by this field 
update. 

As for method calls m(y), two cases arise depending on whether the method m is copy- 
annotated or not. In each case, we also reason differently depending on the type of the 
argument y. If a method is associated with a copy policy t, we compute the corresponding 
type (n T , A r ) and type the result of x:=m cn: x(y) starting in (T, A, 0) with the result type 
consisting of the environment T where x now points to n T , the heap described by the disjoint 
union of A and A r . In addition, the set of strong nodes is augmented with n T since a copy 
method is guaranteed to return a freshly allocated node. The method call may potentially 
modify the memory referenced by its argument y, but the analysis has no means of tracking 
this. Accordingly, if y is a locally allocated memory location of type n, we must remove all 
nodes reachable from n, and set all the successors of n to T. The other case to consider is 
when the method is not associated with a copy policy (written x :=?(?/)). If the parameter 
y is null or not locally allocated, then there is no way for the method call to access locally 
allocated memory and we know that x points to a non-locally allocated object. Otherwise, 
y is a locally allocated memory location of type n, and we must kill all its successors in the 
abstract heap. 
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Command typing rules 

n fresh in A 

r, A, h x:=y : T[x ^ T(y)}, A, 6 r, A, G h x := new en : T[x i-> n], A[(n, _) Hl],9U {n} 

T(y) = t ig{T„„t,T} T(y) = n 

[T,A,e}\- x:=y.f :r[x^t],A,e r, A,0 h x:=y.f ': f> ^ A[n,/]],A,e 
r(x) = n n G © 
r, A, h x.f:=y : T, A[n, f ^ 1%)], 6 
r(s) = n n£© (T, Ajn, / M. r(y)], 0) g (r', A', ©') (r, A, 0) □ (r', A', 0') 
r, A,0 h x.f: = y : V , A',0' 

r, a,© h ci : ri, Ai,©i (ri,Ai,9i) c (r', a',©') 
r, a, © h c 2 : r 2 , a 2 , ©a (r 2 , a 2 , © 2 ) c (r', a', ©') 

r, A, h i/ (*) then ci e/se c 2 ^ : T', A', ©' 

r',A',©' he: r o ,Ao,0 o (r,A,©) □ (r',A',©') (r , A ,e (] ) g (r', a',©') 

r, A, © h while (*) do c done : T', A', 0' 

r.A.ehq iTi.A!,©! r^Ax,©! hc 2 :r 2 ,A 2 ,© 2 

r, A,0 h ci;c 2 : r 2 , A 2 ,0 2 

n p (X) = r $(r) = (n T ,A T ) nodes ( A) n nodes (A T ) = (T(n) = _L) V (r(j/) = T out ) 
r, A,© h x:=m cn:X (y) : T[x i-> n T ], A U A r , U {n r } 

rip(X) = r $(t) = (n T , A T ) nodes(A) n nodes(A T ) = 
KillSucc n (T, A, 0) = (r\ A', ©') r(j/) = n 

r, A, © h x:=m cn ; X (y) : T'[x n T ], A' U A,-,©' U {n T } 

(T(y) = -L) V (r(y) = T otlt ) ffi»g«cc ra (r, A, ©) = (T', A', ©') T(y) = n 

r, A, © h x:=?(y) : !> ^ T out ], A, © I\ A, © h *•:=?(</) : T'[x ^ T out ],A', 0' 



r, A, h return x : F[ret h-s> r(x)],A, 

Method typing rule 

[•^±][x-^T ollt ],0,0hc:r,A,0 
U p (X)=t $>(r) = (n T ,A T ) (r, A,©) jjj (r', A T ,{n T }) r'(ret) = n T 
h Copy(X) m(x):=c 

Program typing rule 

Vc/ e p, Vmd 6 c(, h md 

Notations: We write A[(n, _) t-¥ _L] for the update of A with a new node n for which all successors are 
equal to _L. We write KillSucc n for the function that removes all nodes reachable from n (with at least one 
step) and sets all its successors equal to T. 

Figure 7. Type System 

Finally, the rule for method definition verifies the coherence of the result of analysing 
the body of a method m with its copy annotation <3?(t). Type checking extends trivially to 
all methods of the program. 

Note the absence of a rule for typing an instruction x.f:=y when T(x) = T or T ou t. In 
a first attempt, a sound rule would have been 

r(s) = T 

I -A ,./: .</ : I. A 

Because x may point to any part of the local shape we must conservatively forget all 
knowledge about the field /. Moreover we should also warn the caller of the current method 
that a field / of his own local shape may have been updated. We choose to simply reject 
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1 class LinkedList<E> implements Cloneable { 

2 private @Deep Entry<E> header; 
3 

4 private static class Entry<E> { 
@Shallow E element; 
@Deep Entry<E> next; 
@Deep Entry<E> previous; 



5 

6 

7 

8 

9 
10 
11 
12 
13 
11 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 } 



@Copy public Object clone () { 
LinkedList<E> clone = null; 
clone = (LinkedList<E>) super . clone () ; 
clone. header = new Entry<E>; 
clone . header . next = clone . header; 
clone . header . previous = clone . header ; 
Entry<E> e = this . header . next; 
while (e != this. header) { 

Entry<E> n = new Entry<E>; 

n. element = e. element; 

n . next = clone . header ; 

n. previous = clone . header . previous ; 

n . previous . next = n; 

n . next . previous = n; 

e = e.next; 

} 

return clone; 



T13 


header 

clone — *• ( ) *-T ou .i n *- _L e — *■ -L 


1*14 


s~ . header .^S^ 
clone— Q >r s ) elem i± n ^ ± ± 


Tie 


header \ , 

clone — (Q >Q*-iS^± n— ± e — -L 


Ti, 


r\ header ifQelem. 
clone — (J . Q- .T„„ t e T„„« 

VJ>n 


Tig 




T 22 


pj: ev. 
header i^sl prev. 
clone — *(J "(O^Si£-Cy * — " e — >T °"> 




„ header UA e lSm. T 
clone— Q .Q .T oui e— T„„< 

v£>n 


T21 


„ header ^1 el | m _ 
clone— (J .Q- .T„, e— T„„< 



Figure 8. Intermediate Types for java.util.LinkedList.cloneQ 

copy methods with such patterns. Such a policy is strong but has the merit to be easily 
understandable to the programmer: a copy method should only modify locally allocated 
objects to be typable in our type system. For similar reasons, we reject methods that 
attempt to make a method call on a reference of type T because we can not track side effect 
modifications of such methods without losing the modularity of the verification mechanism. 

Example 3.6 (Case Study: java.util.LinkedList). In this example, we demonstrate the 
use of the type system on a challenging example taken from the standard Java library. 
The companion web page provides a more detailed explanation of this example p]. The 
class java.util.LinkedList provides an implementation of doubly-linked lists. A list is 
composed of a first cell that points through a field header to a collection of doubly-linked 
cells. Each cell has a link to the previous and the next cell and also to an element of 
(parameterized) type E. The clone method provided in java.lang library implements a 
"semi-shallow" copy where only cells of type E may be shared between the source and the 
result of the copy. In Fig. [8] we present a modified version of the original source code: 
we have inlined all method calls, except those to copy methods and removed exception 
handling that leads to an abnormal return from the method^ Note that there was one 
method call in the original code that was virtual and hence prevented inlining. It has been 

•""Inliiiing is automatically performed by our tool and exception control flow graph is managed as standard 
control flow but omitted here for simplicity. 
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necessary to make a private version of this method. This makes sense because such a virtual 
call actually constitutes a potentially dangerous hook in a cloning method, as a re-defined 
implementation could be called when cloning a subclass of Linkedlist. 

In Fig. [8] we provide several intermediate types that are necessary for typing this method 
(Tj is the type before executing the instruction at line i). The call to super. clone at line 
12 creates a shallow copy of the header cell of the list, which contains a reference to the 
original list. The original list is thus shared, a fact which is represented by an edge to T out 
in type T i3 . 

The copy method then progressively constructs a deep copy of the list, by allocat- 
ing a new node (see type X14) and setting all paths clone. header, clone. header. next 
and clone. header. previous to point to this node. This is reflected in the analysis by 
a strong update to the node representing path clone. header to obtain the type T\§ that 
precisely models the alias between the three paths clone. header, clone. header. next and 
clone. header. previous (the Java syntax used here hides the temporary variable that is 
introduced to be assigned the value of clone. header and then be updated). 

This type Tyj is the loop invariant necessary for type checking the whole loop. It is 
a super- type of Tiq (updated with e \-t T out ) and of T24 which represents the memory at 
the end of the loop body. The body of the loop allocates a new list cell (pointed to by 
variable n) (see type Tig) and inserts it into the doubly- linked list. The assignment in line 
22 updates the weak node pointed to by path n. previous and hence merges the strong node 
pointed to by n with the weak node pointed to by clone. header, representing the spine of 
the list. The assignment at line 23 does not modify the type T23. 

Notice that the types used in this example show that a flow-insensitive version of the 
analysis could not have found this information. A flow-insensitive analysis would force the 
merge of the types at all program points, and the call to super. clone return a type that is 
less precise than the types needed for the analysis of the rest of the method. 

3.5. Type soundness. The rest of this section is devoted to the soundness proof of the type 
system. We consider the types T = (r, A,9),Ti = (Ti, Ai, T 2 = (r 2 ,A 2 ,e 2 ) G T, a 
program c G Prog, as well as the configurations (p, h, A), (p%, h%, Ai), (p 2 , A2) G State. 

Assignments that modify the heap space can also modify the reachability properties 
of locations. This following lemma indicates how to reconstruct a path to If in the initial 
heap from a path tt' to a given location If in the assigned heap. 

Lemma 3.7 (Path decomposition on assigned states). Assume given a path tt, field f , and 
locations I, I' such that {p, h) h tt JJ- and assume that for any path tt' and a location If we 
have {p, h[l, f 1— > I']) h tt' JJ- If. Then, either 

(p,h)\-7r'^lf 

or 3tt z , 7Ti, . . . , TT n ,7Tf such that: 

' TT' = TT z .f.TTl.f f.TT n .f.TTf 

(p, /l) h 7T 2 JJ- Z 
(p, h) h TT.TTf Jj. If 
, Wi, ...,7T n , (p,h) h TT.TTi JJ- I 

The second case of the conclusion of this lemma is illustrated in Fig. [5[ 
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7T, 



7T ? 



7T, 



7T 



/ 



7T, 



TT 



V 



TTJ_ 



f 



TT 



V 



TTf 



f 



(a) in (p, ft) (B) in (p, hp, /•->'']> 

Figure 9. An Illustration of Path Decomposition on Assigned States. 



Proof. [See Coq proof Misc . Access_Path_Eval_putf ield_case [lj] The proof is 
done by induction on tt. □ 

We extend the previous lemma to paths in both the concrete heap and the graph types. 

Lemma 3.8 (Pathing through strong field assignment). Assume (p,h,A) ~ (r, A, O) with 
p(x) = l x G A, r(x) = n x G 0, p(y) = l y , and T{y) = t y . Additionally, suppose that for 
some path tt, value v, and type t: 

{p,h[l x ,f (->■ l y \) h tt ij-v 
{T,A[n x ,f^t y ]) h7rJ|t. 
Then at least one of the following four statements hold: 

((p,h) b vr J| y A (r, A) bvrJjt) 
(3TT>,(p,h)T-y.TT' llvA(r,A)hy.TT> lit) 
t = T 



(1) 
(2) 
(3) 
(4) 



Proof. [See Coq proof Misc . strong_subst_prop [lj] The non-trivial part of the lemma 
concerns the situation when t ^ T and u/o). In that case, the proof relies on Lemma 3.7 



The two parts of the disjunction in Lemma |3.7| are used to prove one of the two first 
statements. If the first part of the disjunction holds, we can assume that {p, h) h tt 4 v. 
Then, since (p,h,A) ~ (r, A, 0), we also have (r, A) h tt JJ. t. This implies the first main 
statement. If the second part of the disjunction holds, then, by observing that (p, h) b y 4 ly, 
we can derive the sub-statement zkj, (p, h) b y.TTf JJ- v. As previously, by assumption we 
also have (r, A) b y.TTf JJ- 1, which implies our second main statement. □ 

We first establish a standard subject reduction theorem and then prove type soundness. 
We assume that all methods of the considered program are well-typed. 

Theorem 3.9 (Subject Reduction). Assume T\ b c : T2 and {pi, hi,A±) ~ T\. 
If (c, {p\, hi, Ai)) {p 2 ,h 2 ,A 2 ) then {p 2 ,h 2 ,A 2 ) ~ T 2 . 

Proof. [See Coq proof Soundness . sub ject_reduction pQ] The proof proceeds by 
structural induction on the instruction c. For each reduction rule concerning c (Fig. [3]), 
we prove that the resulting state is in relation to the type T 2 , as defined by the main type 
interpretation rule in Fig. [5] This amounts to verifying that the two premises of the type 
interpretation rule are satisfied. One premise checks that all access paths lead to related 
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(value, node) pairs. The other checks that all nodes that are designated as "strong" in the 
type interpretation indeed only represent unique locations. 

We here present the most intricate part of the proof, which concerns graph nodes 
different from T, _L, or T ou t, and focus here on variable and field assignment. The entire 
proof has been checked using the Coq proof management system. 

If c = x:=y then (p2,h2,A 2 ) = (pi [a; •->• pi(y)], hi, Ax). In (p 2 , h 2 , T 2 , A 2 ) take a path 
x.tt: since p 2 (x) = pi(y) and T 2 (x) = Tx(y), y.ir is also a path in (pi, hx,Tx, Ai). Given that 
(pi, h\, Ax) ~ (ri, Ai, &x), we know that y.ir will lead to a value v (formally, (pi, h\) h y.ir JJ. 
v) and a node n (formally, (Ti, Ai) h y.7r JJ, n) such that (pi, hx, Ax) , (Fx, Ai) lh w ~ n. The 
two paths being identical save for their prefix, this property also holds for x.tt (formally, 
(px[x H> pi(y)], hx,Ax) , (Tx[x h-> Tx(y)], Ai) lh u ~ n, (pi [a; (-)■ pi(y)],/ii) h x.7r JJ. v, and 
(ri[x I—)- Ti(y)],Ai) h x.tt JJ. n). Paths not beginning with x are not affected by the 
assignment, and so we can conclude that the first premise is satisfied. 

For the second premise, let n G @ 2 and assume that n can be reached by two paths ir 
and tt' in A 2 . If none or both of the paths begin with x then, by assumption, the two paths 
will lead to the same location in h 2 = h\. Otherwise, suppose that, say, tt begins with x and 
7r' with a different variable z and that they lead to I and I' respectively. Since T 2 (x) = Tx(y) 
and A2 = Ai, then by assumption there is a path tt" in (pi, hx, Ai,Ti) that starts with y, 
and such that (pi, hx) h it" JJ. I. As z is not affected by the assignment, we also have that 
{px,hx) h tt' JJ- I'. Therefore, as n G &x and (px,hx,Ax) ~ (ri,Ai,0i), we can conclude 
that I = I'. This proves that (px[x i-> Pi(y)], hx, Ax) ~ (Ti[x 1— )• Ti(|/)], Ai, 0i). 

If c = x./:=y then {p 2 ,h 2 ,A 2 ) = {px,hx[(px(x), f) h-> pi(y)],Ai). Two cases arise, 
depending on whether n = T(x) is a strong or a weak node. 

If c = x.f:=y and n = r(x) G the node n represents a unique concrete cell in h. To 
check the first premise, we make use of Lemma 3.8 on a given path tt, a node n and a 
location I such that (r2, A2) \~ tt JJ. n and (p 2 ,h 2 ) h 7r JJ- /. This yields one of two main 
hypotheses. In the first case tt is not modified by the /-redirection (formally, (pi, hx) \~ tt JJ. 
I A (Ti, Ai) h 7T JJ. n), and by assumption (pi, hx, Ax) , (Fx, Ai) lh / ~ n. Now consider any 
path ttq such that (p 2 , ^2) h ttq JJ. /: there is a node no such that (p 2 , h 2 ) h 7To JJ- no, and we 
can reapply Lemma 3.8 to find that (pi, foi, Ai) , (ri, Ai) lh I ~ no- Hence no = n, and since 
nodes in Ai and A2 are untouched by the field assignment typing rule, we can conclude 
that the first premise is satisfied. In the second case (tt is modified by the /-redirection) 
there is a path tt' in (pi, hx, Ti, Ai) such that y.ir' leads respectively to I and n (formally 
(px,hx) h y.Tr' JJ. I A (ri,Ai) h y.TT 1 JJ- n). As in the previous case, for any path ttq such 
that (p 2 , h 2 ) h 7To JJ- I and (p 2 , ^2) h ttq JJ, no, we have (pi, /ii, Ai) , (ri, Ai) lh / ~ no, which 
implies that n = no, and thus we can conclude that ttq leads to the same node as tt, as 
required. 

For the second premise, let n G ©2 and assume that n can be reached by two paths 
7T and 7r' in A2. The application of Lemma 3.8 to both of these paths yields the following 
combination of cases: 

neither path is modified by the /-redirection: formally, (pi, hx) h tt JJ, I A (Tx, Ai) h 
7T JJ- n A (pi, /ii) h 7r' JJ- Z' A (Fi, Ai) h 7r' JJ, n. By assumption, Z = I'. 

one of the paths is modified by the /-redirection: without loss of generality, assume 
(pi,hx) h tt' JJ- I' A (ri, Ai) h tt JJ. n, and there is a path 7T* in (pi, hx, Tx, Ai) such that 
y.ir* leads to Z in the heap, and n in the graph (formally, (pi, hx) h y.ir* JJ. Z A (Tx, Ai) h 
y.7r^ JJ. n). By assumption, 1 = 1'. 
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both paths are modified by the /-redirection: we can find two paths 7r* and tt^ such 
that 7/.7T* leads to / in the heap and n in the graph, and y.ir 1 * leads to I' in the heap and 
n in the graph (formally, r~ y.ir+ JJ Z A (ri, Ai) h y.ir+ JJ n A h y.7r^ JJ 

Z' A (ri, Ai) h y.7r^ JJ. n). By assumption, I = I'. 

In all combinations, I = I' \n h\. Since the rule for field assignment in the operational 
semantics preserves the locations, then I = I 1 in h%. 

This concludes the proof of (pi, h\[(pi(x), /) H> p\(y)],A\) ~ (J?i, Ai[n, / H> T(y)], 81) 
when n 6 8. 

If c = x.f:=y and n = r(ic) ^ 8 here n may represent multiple concrete cells in h. 
Let ai and 02 be the mappings defined, respectively, by the hypothesis (Ti,Ai,6i) C 
(r2, A2, 82) and (ri, Ai[n, / h-> 81) C (r2, A2, 82). The first premise of the proof is 

proved by examining a fixed path tt in (p2, ^2^2, A2) that ends in Iq in the concrete heap, 



and n in the abstract graph. Applying Lemma 3.7 to this path (formally, instantiating 
Z by pi(x), I' by pi(y), If by Zo, tt by y, and 7r' by 7r) yields two possibilities. The first 
alternative is when tt is not modified by the /-redirection (formally, (pi,h\) h 7r JJ. Zo). 
Lemma 3.3 then asserts the existence of a node no that 7r evaluates to in (pi,h\,Ti, A\) 
(formally, (J7i,Ai) h 7r JJ no with n = 01 (no))- Moreover, by assumption uq and Zo are 
in correspondence (formally, (pi,hi,Ai) , (ri, Ai) lh Zo ~ no). To prove that Zo and n are 
in correspondence, we refer to the auxiliary type interpretation rule in Fig. [5j and prove 
that given a path ttq that verifies (p2,h,2) \~ vtq JJ Zq, the proposition (r2,A2) \~ ttq JJ n 'o 



holds. Using Lemma 3.7 on ttq (formally, instantiating Z by pi(x), I' by pi(y), If by lo, tt 
by y, and tt' by ttq), the only non-immediate case is when ttq goes through /. In this case, 

7To = ir z .f.TTi.f f.7r n .f.TTf, and we can reconstruct this as a path to n' in (r2,A2) by 

assuming there are two nodes n' x and n' y such that <ri(ri(a;)) = n' x and o\(Fi(y)) = n' y , and 
observing: 

• (pi, hi) h 7r 2 JJ pi(x) thus (ri,Ai) h 7r z JJ. ri(ic) by assumption. Because access path 
evaluation is monotonic wrt. mappings (a direct consequence of clause (ST2) in the 
definition of sub- typing), we can derive (1^, A2) \~ vr 2 JJ n' x ; 

• for i £ [l,n], (pi,hi) h y.7Tj JJ pi(x) thus by assumption (ri,Ai) h y.7Tj JJ Ti(x). By 
again using the monotony of access path evaluation, we can derive (r2, A2) \~ y.iTi JJ- n' x ; 

• (pi, hi) h y.7rj JJ Zo thus by assumption (Tx, Ai) h y.Tif JJ- no- Hence (T2, A2) \~ y.TXf JJ n' 
by n' = <Ti(no) and due to monotonicity. 

• in Ai[n, / 1— \ Ti(y)], n = T\(x) points to V\(y) by /. Note that because the types 
J?!, Ai,6i and Ti, Ai[n,/ i-> Y\(y)\,Qi share the environment J7i, we have G2(Xi( x )) = 
ai(Ti(x)) = n' x and (72(Ti(y)) = ai(Ti(y)) = n' y . Hence in A2, n' x points to n' y by /, 
thanks to the monotonicity of 02- 

This concludes the proof of (1^, A2) \~ vro JJ n' . The cases when n' x and n y do not exist are 
easily dismissed; we refer the reader to the Coq development for more details. 

We now go back to our first application of Lemma 3.7 and tackle the second alterna- 
tive - when 7r is indeed modified by the /-redirection. Here tt can be decomposed into 

ir z .f.TTi.f f.Tr n .f.7Tf. We first find the node in Ai that corresponds to the location 

Zq: for this we need to find a path in (pi, hi, T2, A2) that leads to both Zq and n' (in 



order to apply Lemma 3.3 we can no longer assume, as in the first alternative, that it 
leads to Zo in (pi,h\)). Since (pi,h{) h y.TTf JJ Zq, the path y.irf might be a good can- 
didate: we prove that it points to n in A2. Assume the existence of two nodes n' x and 
n' y in A2 such that n' x = ai(Ti(x)) = a2(Ti(x)) and n' = ai(Ti(y)) = U2(Ti(y)) (the 
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occurrences of non-existence are dismissed by reducing them, respectively, to the contra- 
dictory case when ^(x) = T, and to the trivial case when n' = T; the equality be- 
tween results of a\ and a"2 stems from the same reasons as in bullet 4 previously). From 
the first part of the decomposition of n one can derive, by assumption, that n z leads to 
in Ai, and, by monotonicity, to n' x in A2 (formally, (pi,/ii) \~ n z JJ- pi(x) entails 
(ri, Ai) h n z JJ, T\(x) implies (T2,A2) I - n z JJ- n' x ). Similarly, we can derive that for any 
i G (Ti,Ai) h y.iTi JJ- T\(x). We use monotonicity both to infer that in A2, n' x 

points to n' y by /, and that for any i G [l,n] (r2,A2) l~ y.ni JJ- n' x . From these three 

statements on A2, we have (r2, A2) \~ n z .f.ni.f f.n n JJ- n' x , and by path decomposition 

(r2, A2) h y.7r/ JJ. n' . This allows us to proceed and apply Lemma 3.3, asserting the exis- 
tence of a node no in (pi, hi, Ai, Ti) that y.7Ty evaluates to (formally, (T\, A\) h y.7r/ JJ- no 
with n = o"i(no)). Moreover, by assumption no and Iq are in correspondence (formally, 
{pi,h\, A\) , (ri, Ai) lh Iq ~ no). The proof schema from here on is quite similar to what 
was done for the first alternative. As previously, we demonstrate that Iq and n are in corre- 
spondence by taking a path no such that (p2, ^2) l~ tto JJ- ^0 and proving (T2, A2) \~ no JJ- n' . 
As previously, using Lemma 3.7 on no (formally, instantiating I by pi(x), I' by pi(y), If by 
Iq, n by y, and n' by 7ro), the only non-immediate case is when no goes through /. In this 

case, no = n' z .f.n[.f f.n' n .f.n'j, and we can reconstruct this as a path to n' in (r2, A2) 

by observing: 

• (pi,hi) h n' z JJ, pi(x) thus (Fi,Ai) h 7t^. JJ. ri(») by assumption. Monotonicity yields 
(r2,A2)h<J|<; 

• for i G [l,n], (pi,/ii) h y.vr^ JJ- pi(x) thus by assumption (Ti, A±) h y.7r^ JJ- Ti(x), and by 
monotonicity we can derive (r2, A2) \~ y-n\ JJ- n' x ; 

• (pi)^i) l~ I/- 7 !"/ JJ- ^0 thus since Iq and no are in correspondence, (]?i,Ai) h y.7rj JJ. no- 
Hence (T2, A2) l~ y.7Tj JJ- n' by n' Q = a±(no) and by monotonicity. 

This concludes the proof of (r2, A2) \~ no JJ- n' , hence the demonstration of the first premise. 

We are now left with the second premise, stating the unicity of strong node represen- 
tation. Assume n G 02 can be reached by two paths n and n' in P2, fi2, T2, A2, we use 
Lemma |3.7| to decompose both paths. The following cases arise: 

neither path is modified by the /-redirection: formally, {pi, hi) I - n JJ. I A {pi,hi) h 
n' JJ. I'. Lemma 3.3 ensures the existence of n' such that n = <Ji{n') and (r^, Ai) h n JJ. 
n' A (ri, Ai) h 7T JJ- n' Thus by assumption, 1 = 1'. 

one of the paths is modified by the /-redirection: without loss of generality, assume 

{pi, hi) h 7r JJ. /, and n' = nQ.f.ni.f f.n n .f.nf with (pi, hi) h y.7r^ JJ. /'. From the first 

concrete path expression, Lemma 3.3 ensures that there is n' such that n = o"i(n / ), and 
(Ti, Ai) h 7r JJ- n' . Moreover, by assumption there exists a strong node n" in Ai that y-nj 
leads to, and that is mapped to n by 01 (formally, n = ai(n") and (ri, Ai) h y.nj JJ. n"). 
Clause (ST3 ) of the definition of subtyping states the uniqueness of strong source nodes: 
thus n' = n" , and we conclude by assumption that 1 = 1'. 

both paths are modified by the /-redirection: Lemma [3 . 7| provides two paths 7rj and 
7Tj such that y.nj leads to I, and y.n'j leads to I' in the heap (formally, (pi, hi) h y.n* JJ- 



/ A {pi, hi) h y.7r^ JJ. /'). As in the previous case, we use the assumption and clause (ST; 



of main sub-typing definition to exhibit the common node n' in Ai that is pointed to by 
both paths. By assumption, 1 = 1'. 

This concludes the demonstration of the second premise, and of the c = x.f:=y case in our 
global induction. □ 
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Theorem 3.10 (Type soundness). If\~p then all methods m declared in the program p are 
secure, i.e., respect their copy policy. 

Proof. [See Coq proof Soundness . soundness [lj] Given a method m and a copy signa- 
ture Copy(X){"r} attached to it, we show that for all heaps h\, hi G Heap, local environments 
pl,p 2 £ Env, locally allocated locations A\,A 2 G V(Loc), and variables x,y G Var, 

(x:=m cn]X (y), {p\,hi,Ai)) -w (^2,^2,^2) implies p 2 ,h 2 ,x |= r. 

Following the rule defining the semantics of calls to copy methods, we consider the 
situation where there exists a potential overriding of m, Copy(X') m(a):=c such that 

(c, {po[a\-> pi(y)],hi,<H)) -w (p',h 2 ,A'}. 



Writing r' for the policy attached to X', we know that r C r'. By Lemma 2.4 it is then 
sufficient to prove that p 2 ,h 2 ,x |= r' holds. 

By typability of Copy(X') m(a):=c, there exist r',r,A,@ such that T'(ret) = n T , 
(r, A, 8) C (F, A T , {n T }) and [ • ^ 1] [x ^ T out ] , 0, h c : T, A, 0. 

Using Theorem 3.9, we know that (p',h 2 ,A') ~ (T,A,6) holds and by subtyping be- 
tween (r, A, 0) and (T' , A T ,{n T }) we obtain that (p',h 2 ,A') ~ (T', A T ,{n r }) holds. Theo- 
rem 3.2 then immediately yields that p 2 , h 2 , x \= r. O 



4. Inference 

In order to type-check a method with the previous type system, it is necessary to infer 
intermediate types at each loop header, conditional junction points and weak field assign- 
ment point. A standard approach consists in turning the previous typing problem into a 
fixpoint problem in a suitable sup-semi-lattice structure. This section presents the lattice 
that we put on (T, C). Proofs are generally omitted by lack of space but can be found in 
the companion report. Typability is then checked by computing a suitable least-fixpoint 
in this lattice. We end this section by proposing a widening operator that is necessary to 
prevent infinite iterations. 

Lemma 4.1. The binary relation C is a preorder on T . 

Proof. The relation is reflexive because for all type T G T, T C id T. The relation is 
transitive because if there exists types Ti,T 2 ,T% G T such that T\ C CTl T2 and T 2 C CT2 T3 
for some fusion maps o\ and a 2 then T\ Q as T3 for the fusion map o~ 3 define by 03(71) = 
a 2 {a\{n)) if a\{n) G N or T otherwise. □ 

We write = for the equivalence relation defined by T\ = T 2 if and only if T\ C T2 and 
T 2 CTi. Although this entails that C is a partial order structure on top of (T, =), equality 
and order testing remains difficult using only this definition. Instead of considering the 
quotient of T with =, we define a notion of well-formed types on which C is antisymmetric. 
To do this, we assume that the set of nodes, variable names and field names are countable 
sets and we write ni (resp. Xi and fi) for the ith node (resp. variable and field). A type 
(r, A, 0) is well-formed if every node in A is reachable from a node in V and the nodes in 
A follow a canonical numbering based on a breadth-first traversal of the graph. Any type 
can be garbage-collected into a canonical well- formed type by removing all unreachable 
nodes from variables and renaming all remaining nodes using a fixed strategy based on a 
total ordering on variable names and field names and a breadth-first traversal. We call 
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// Initialization. 

// a— nodes are sets in t. 

II (^transitions can be 
// non— deterministic . 

a = lift <ri,r 2 , Ai U A 2 ) 

// Determinize o^transitions : 
// start with the entry points, 
for {(x,t);(x,t')} C (Ti x T 2 ) { 
fusion ({t,t'}) 

) 

// Determinize o^transitions : 
// propagate inside the graph, 
while 3u e a, 3/ e Field, \succ(u, f)\ > 1 { 
fusion (succ (u, /) ) 

} 

// a is now fully determinized : 
// convert it back into a type. 
(r,A,6) = ground <ri,r 2 , a) 



// S is a set of tat. 

II \S\ denotes a node labelled by S. 

void fusion (S 1 ) { 

/ / Create a new o^node . 

NDG node n = (|SD 

a <— a + n 

// Recreate all edges from the fused 
// nodes on the new node, 
for t£S{ 

for / 6 Field { 

if 3u,a(t,f) = u I 
1 1 Outbound edges . 
a <— a with (n, f) i— ¥ u 

} 

if 3n',a(n',f) = t { 
/ / Inbound edges . 
a <— a with (n', /) >-» n 

} } 

// Delete the fused node. 

OL 4 — GJ — t 

} } 



Figure 10. Join Algorithm 

this transformation GC. The following example shows the effect of GC using a canonical 
numbering. 

T GC J" T 



T^O-O-O ; T GC - - 



"l)< (...) (»5) 



t t t ft! 

x l x 2 X 3 X\ X 2 X 3 

Since by definition, C only deals with reachable nodes, the GC function is a =-morphism 
and respects type interpretation. This means that an inference engine can at any time 
replace a type by a garbage-collected version. This is useful to perform an equivalence test 
in order to check whether a fixpoint iteration has ended. 

Lemma 4.2. For all well-formed types T\,T2 E T, T% = T2 iffT\ = T2. 

Definition 4.3. Let U be an operator that merges two types according to the algorithm in 



Fig. 10 



The procedure has T\ = (Ti, Ai,Oi) and T2 = (r2,A2,02) as input, then takes the 
following steps. 

(1) It first makes the disjunct union of Ai and A2 into a non-deterministic graph (ndg) a, 
where nodes are labelled by sets of elements in t. This operation is performed by the 
lift function, that maps nodes to singleton nodes, and fields to transitions. 

(2) It joins together the nodes in a referenced by using the fusion algorithi 



^Remark that IVbindmgs are not represented in a, but that node set fusions are trivially traceable. This 
allows us to safely ignore Ti during the following step and still perform a correct graph reconstruction. 



21 



T. JENSEN, F. KIRCHNER, AND D. PICHARDIE 



// Convert a type to an NDG 

NDG lift <ri,r 2 ,A) { 

NDG a = undef 
for x £ Var { 

a^Q + d{ri(a:)}[) + (|{r2(x)}D 

} 

for n e A { 

if 3/ 6 Fields, 36 £ BaseType, A[n,f] = b {| 
a <r- a + i{n}\) + d{6}} 
a <- a with (AMD,/) i-> d{6}[) 

} } 

return a 



1 1 Convert an NDG to a type 
T ground (r lf r 2 ,a) { 

( Var -> t) T = Ax.± 

A A = undef 

for AT e a { 

for x e Var, Ti{x) £ Af V T 2 (x) e JV { 

r <- r with ih>jv| 

} } 

for JV, JV' S a { 
if a(JV,/) = JV' { 

A <- A with (JVJ,, /) i-5- JV'J, 

} } 

return (r, A) 



// < CT — sup function 

BaseType 4- (N) { 
if T e JV return T 
else if Vc S JV, c = _L return 
else return freshNodeO 



Figure 11. Auxiliary join functions 



X—>-(niJ [U3)*—X 



• ({"1,"3}) 



?/^ ({"2,»4}) 



'! 

A 
•o 

4 

- T 



Figure 12. An example of join 



(3) Then it scans the NDG and merges all nondeterministic successors of nodes. 

(4) Finally it uses the ground function to recreate a graph A from the now- deterministic 
graph a. This function operates by pushing a node set to a node labelled by the 
<(j-sup of the set. The result environment T is derived from Tj and a before the 
A-reconstruction. 

All state fusions are recorded in a map a which binds nodes in Ai U A2 to nodes in A. 
Figure 11 contains the auxiliary functions used in the above procedure. Figure 12 unfolds 
the algorithm on a small example. 
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Theorem 4.4. The operator U defines a sup-semi-lattice on types. 

Proof. First note that o~\ and 02, the functions associated with the two respective sub- typing 
relations, can easily be reconstructed from a. 



Upper bound: Let (T,a) = T\ UT2: we prove that T\ C T and T2 C T. Hypothesis (ST2 
is discharged by case analysis, on t\ and on £2- The general argument used is that the 
join algorithm does not delete any edges, thus preserving all paths in the initial graphs. 

Least of the upper bounds: Let (T,a) = T\ U T2. Assume there exists T' such that 
T\ C T" and T2 C T'. Then we prove that T C T". The proof consists in checking 
that the join algorithm produces, at each step, an intermediary pseudo-type T such 
that T C X". The concrete nature of the algorithm drives the following, more detailed 
decomposition. 

(1) Given a function a, define a state mapping function < T , and a Co-like relation C T 
on non-deterministic graphs. The aim with this relation is to emulate the properties 
of Eo-j lifting the partial order < a on nodes to sets of nodes (cells). Lift T" into an 
NDG a'. 

(2) Using the subtyping relations between T\, T2, and X", establish that the disjunct 
union and join steps produce an intermediary ndg /3 such that j3 C r a' . 

(3) Ensure that the fusions operated by the join algorithm in f3 produce an NDG 7 such 
that 7 C T a' . This is done by case analysis, and depending on whether the fusion 
takes place during the first entry point processing phase, or if it occurs later. 

(4) Using the t-lattice, show that the ground operation on the NDG 7 produces a type T 
that is a sub-type of T' . □ 

The poset structure does have infinite ascending chains, as shown by the following example. 

|Wi C ;x^Q^ . ;L - L fx^Q^* ... Ql> . ; C ... 

Fixpoint iterations may potentially result in such an infinite chain so we have then to rely 
on a widening [9j operator to enforce termination of fixpoint computations. Here we follow 
a pragmatic approach and define a widening operator V G T x T — > T that takes the result 
of U and that collapses together (with the operator fusion defined above) any node n and 
its predecessors such that the minimal path reaching n and starting from a local variable is 
of length at least 2. This is illustrated by the following example. 



Widon( -Q ■■■() . .. ■0'-0'-Q'-) = 

Collapse( ^Q-O^O- 1 ) = "00°' 



This ensures the termination of the fixpoint iteration because the number of nodes is then 
bounded by 2iV with iV the number of local variables in the program. 
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5. Experiments 

The policy language and its enforcement mechanism have been implemented in the form 
of a security tool for Java bytecode. Standard Java ©interface declarations are used to 
specify native annotations, which enable development environments such as Eclipse to parse, 
identify and auto-complete ©Shallow, ©Deep, and ©Copy tags. Source code annotations 
are being made accessible to bytecode analysis frameworks. Both the policy extraction and 
enforcement components are implemented using the Javalib/Sawja static analysis libraries^] 
to derive annotations and intermediate code representations, and to facilitate the creation 
of an experimental Eclipse plugin. 

In its standard mode, the tool performs a modular verification of annotated classes. 
We have run experiments on several classes of the standard library (specially in the package 
java.util) and have successfully checked realistic copy signatures for them. These experi- 
ments have also confirmed that the policy enforcement mechanism facilitates re-engineering 
into more compact implementations of cloning methods in classes with complex dependen- 
cies, such as those forming the gnu. xml. transform package. For example, in the Stylesheet 
class an inlined implementation of multiple deep copy methods for half a dozen fields can 
be rewritten to dispatch these functionalities to the relevant classes, while retaining the ex- 
pected copy policy. This is made possible by the modularity of our enforcement mechanism, 
which validates calls to external cloning methods as long as their respective policies have 
been verified. As expected, some cloning methods are beyond the reach of the analysis. We 
have identified one such method in GNU Classpath's TreeMap class, where the merging of 
information at control flow merge points destroys too much of the inferred type graph. A 
disjunctive form of abstraction seems necessary to verify a deep copy annotation on such 
programs. 

The analysis is also capable of processing un-annotated methods, albeit with less pre- 
cision than when copy policies are available — this is because it cannot rely on annotations 
to infer external copy method types. Nevertheless, this capability was used to test the tool 
on two large code bases. The 17000 classes in Sun's rt.jar and the 7000 others in the 
GNU Classpath have passed our scanner un-annotated. Among the 459 cloneQ methods 
we found in these classes, only 15 have been rejected because of an assignment or method 



call on non-local memory, as explained in Section 3.4 Assignment on non-local memory 
means here that the copying method is updating fields of other objects than the result of 
the copy itself. For such examples, our shape analysis seems too coarse to track the de- 
pendencies between sources and copy targets. For 78 methods we were unable to infer the 
minimal, shallow signature {} (the same signature as java.lang. Object. cloneQ). In some 
cases, for instance in the DomAttr class, this will happen when the copy method returns 
the result of another, unannotated method call, and can be mitigated with additional copy 
annotations. In other cases, merges between abstract values result in precision losses: this 
is, for instance, the case for the clone method of the TreeMap class, as explained above. 

Our prototype confirms the efficiency of the enforcement technique: these verifications 
took about 25s to run on stock hardware. The prototype, the Coq formalization and 



proofs, as well as examples of annotated classes can be found at http : / /www . irisa 



|f r/ celt ique/ext/ clones 



http://sawja.inria.fr 
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6. Related Work 

Several proposals for programmer-oriented annotations of Java programs have been pub- 
lished following Bloch's initial proposal of an annotation framework for the Java language 
[5]. These proposals define the syntax of the annotations but often leave their exact se- 
mantics unspecified. A notable exception is the set of annotations concerning non-null 
annotations [11] for which a precise semantic characterization has emerged [12]. Concern- 
ing security, the GlassFish environment in Java offers program annotations of members of 
a class (such as ©DenyAll or ©RolesAllowed) for implementing role-based access control 
to methods. 

To the best of our knowledge, the current paper is the first to propose a formal, se- 
mantically founded framework for secure cloning through program annotation and static 
enforcement. The closest work in this area is that of Anderson et al. [3] who have designed 
an annotation system for C data structures in order to control sharing between threads. 
Annotation policies are enforced by a mix of static and run-time verification. On the run- 
time verification side, their approach requires an operator that can dynamically "cast" a cell 
to an unshared structure. In contrast, our approach offers a completely static mechanism 
with statically guaranteed alias properties. 

Aiken et al. proposes an analysis for checking and inferring local non-aliasing of data [2]. 
They propose to annotate C function parameters with the keyword restrict to ensure that 
no other aliases to the data referenced by the parameter are used during the execution of 
the method. A type and effect system is defined for enforcing this discipline statically. 
This analysis differs from ours in that it allows aliases to exist as long as they are not used 
whereas we aim at providing guarantees that certain parts of memory are without aliases. 
The properties tracked by our type system are close to escape analysis [H [7] but the analyses 
differ in their purpose. While escape analysis tracks locally allocated objects and tries to 
detect those that do not escape after the end of a method execution, we are specifically 
interested in tracking locally allocated objects that escape from the result of a method, as 
well as analyse their dependencies with respect to parameters. 

Our static enforcement technique falls within the large area of static verification of heap 
properties. A substantial amount of research has been conducted here, the most prominent 
being region calculus [18] . separation logic [15] and shape analysis [16]. Of these three 
approaches, shape analysis comes closest in its use of shape graphs. Shape analysis is a 
large framework that allows to infer complex properties on heap allocated data-structures 
like absence of dangling pointers in C or non-cyclicity invariants. In this approach, heap cells 
are abstracted by shape graphs with flexible object abstractions. Graph nodes can either 
represent a single cell, hence allowing strong updates, or several cells (summary nodes). 
Materialization allows to split a summary node during cell access in order to obtain a node 
pointing to a single cell. The shape graphs that we use are not intended to do full shape 
analysis but are rather specialized for tracking sharing in locally allocated objects. We use 
a different naming strategy for graph nodes and discard all information concerning non- 
locally allocated references. This leads to an analysis which is more scalable than full shape 
analysis, yet still powerful enough for verifying complex copy policies as demonstrated in 
the concrete case study java.util.LinkedList. 

Noble et al. [14J propose a prescriptive technique for characterizing the aliasing, and 
more generally, the topology of the object heap in object-oriented programs. This technique 
is based on alias modes which have evolved into the notion of ownership types [8]. In this 
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setting, the annotation @Repr is used to specify that an object is owned by a specific object. 
It is called a representation of its owner. After such a declaration, the programmer must 
manipulate the representation in order to ensure that any access path to this object should 
pass trough its owner. Such a property ensures that a @Repr field must be a ©Deep field 
in any copying method. Still, a ©Deep field is not necessarily a @Repr field since a copying 
method may want to deeply clone this field without further interest in the global alias 
around it. Cloning seems not to have been studied further in the ownership community and 
ownership type checking is generally not adapted to flow-sensitive verification, as required 
by the programming pattern exhibited in existing code. In this example, if we annotate 
the field next and previous with @Repr, the clone local variable will not be able to 
keep the same ownership type at line 12 and at line 26. Such a an example would require 
ownership type systems to track the update of a reference in order to catch that any path 
to a representation has been erased in the final result of the method. 

We have aimed at annotations that together with static analysis allows to verify existing 
cloning methods. Complementary to our approach, Drossopoulou and Noble [10] propose a 
system that generate cloning methods from annotation inpired by ownership types. 



7. Conclusions and Perspectives 

Cloning of objects is an important aspect of exchanging data with untrusted code. Cur- 
rent language technology for cloning does not provide adequate means for defining and 
enforcing a secure copy policy statically; a task which is made more difficult by important 
object-oriented features such as inheritance and re-definition of cloning methods. We have 
presented a flow-sensitive type system for statically enforcing copy policies defined by the 
software developer through simple program annotations. The annotation formalism deals 
with dynamic method dispatch and addresses some of the problems posed by redefinition 
of cloning methods in inheritance-based object oriented programming language (but see 



Section 2.3 for a discussion of current limitations). The verification technique is designed to 
enable modular verification of individual classes. By specifically targeting the verification 
of copy methods, we consider a problem for which it is possible to deploy a localized version 
of shape analysis that avoids the complexity of a full shape analysis framework. This means 
that our method can form part of an extended, security-enhancing Java byte code verifier 
which of course would have to address, in addition to secure cloning, a wealth of other 
security policies and security guidelines as e.g., listed on the CERT web site for secure Java 
programming [6]. 

The present paper constitutes the formal foundations for a secure cloning framework. 
All theorems except those of Section [4] have been mechanized in the Coq proof assistant. 
Mechanization was particularly challenging because of the storeless nature of our type in- 
terpretation but in the end proved to be of great help to get the soundness arguments 
right. 

Several issues merit further investigations in order to develop a full-fledged software 
security verification tool. The extension of the policy language to be able to impose policies 



on fields defined in sub-classes should be developed (c/. discussion in Section 2.3 ). We believe 
that the analysis defined in this article can be used to enforce such policies but their precise 
semantics remains to be defined. In the current approach, virtual methods without copy 
policy annotations are considered as black boxes that may modify any object reachable from 
its arguments. An extension of our copy annotations to virtual calls should be worked out 
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if we want to enhance our enforcement technique and accept more secure copying methods. 
More advanced verifications will be possible if we develop a richer form of type signatures 
for methods where the formal parameters may occur in copy policies, in order to express a 
relation between copy properties of returning objects and parameter fields. The challenge 
here is to provide sufficiently expressive signatures which at the same time remain humanly 
readable software contracts. The current formalisation has been developed for a sequential 
model of Java. We conjecture that the extension to interleaving multi-threading semantics 
is feasible and that it can be done without making major changes to the type system because 
we only manipulate thread- local pointers. 

An other line of work could be to consider the correctness of equals () methods with 
respect to copying methods, since we generally expect x.clone().equals(x) to be true. The 
annotation system is already in good shape for such a work but a static enforcement may 
require a major improvement of our specifically tailored shape analysis. 
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